Lab 2

Advanced Data Visualization

Instructions

Create a Quarto file for ALL Lab 2 (no separate files for Parts 1 and 2).

  • Make sure your final file is carefully formatted, so that each analysis is clear and concise.
  • Be sure your knitted .html file shows all your source code, including any function definitions.

Part One: Identifying Bad Visualizations

If you happen to be bored and looking for a sensible chuckle, you should check out these Bad Visualisations. Looking through these is also a good exercise in cataloging what makes a visualization good or bad.

Dissecting a Bad Visualization

Below is an example of a less-than-ideal visualization from the collection linked above. It comes to us from data provided for the Wellcome Global Monitor 2018 report by the Gallup World Poll:

  1. While there are certainly issues with this image, do your best to tell the story of this graph in words. That is, what is this graph telling you? What do you think the authors meant to convey with it?

It appears that this image is trying to represent the proportions of people in each country that answered affirmatively to the statement “Vaccines are safe”. That data come from the year 2018, and are grouped by global region. We can see that the median affirmative answer in each global region increases from the bottom of the plot to the top.

  1. List the variables that appear to be displayed in this visualization. Hint: Variables refer to columns in the data.

Variables include:

  • Percentage of people who believe that vaccines are safe
  • Global region
  • Region medians
  • Countries
  1. Now that you’re versed in the grammar of graphics (e.g., ggplot), list the aesthetics used and which variables are mapped to each.

The aesthetics map to variables in the following ways:

  • x is mapped to proportion of the population that believes that vaccines are safe
  • y is mapped to…nothing?
  • color is mapped to goblal region
  • label is mapped individual country names
  • Each point represents the proportion of a country’s pro-vacc’ers, and is drawn with geom_point()
  • Vertical lines are added using geom_vline() to show regional medians, which increase as one looks higher in the plot
  1. What type of graph would you call this? Meaning, what geom would you use to produce this plot?

This appears to be a scatterplot that also creates a quasi-faceting effect by grouping countries based on region, and then separating them vertically depending on the median proportion of belief in vaccine health in each global region. I would use geom_point() to create this plot.

  1. Provide at least four problems or changes that would improve this graph. Please format your changes as bullet points!

Four ways to improve this plot are:

  • Eliminate the legend
  • Double-code the points to further distinguish them beyond color
  • Eliminate the appearance of the y-axis in each facet representing something quantitative
  • Make points clickable so that one can see proportions for individual countries

Improving the Bad Visualization

The data for the Wellcome Global Monitor 2018 report can be downloaded at the following site: https://wellcome.ac.uk/reports/wellcome-global-monitor/2018

There are two worksheets in the downloaded dataset file. You may need to read them in separately, but you may also just use one if it suffices.

  1. Improve the visualization above by either re-creating it with the issues you identified fixed OR by creating a new visualization that you believe tells the same story better.

Part Two: Broad Visualization Improvement

The full Wellcome Global Monitor 2018 report can be found here: https://wellcome.ac.uk/sites/default/files/wellcome-global-monitor-2018.pdf. Surprisingly, the visualization above does not appear in the report despite the citation in the bottom corner of the image!

Second Data Visualization Improvement

For this second plot, you must select a plot that uses maps so you can demonstrate your proficiency with the leaflet package!

  1. Select a data visualization in the report that you think could be improved. Be sure to cite both the page number and figure title. Do your best to tell the story of this graph in words. That is, what is this graph telling you? What do you think the authors meant to convey with it?
Original image (Source: Wellcome Global Monitor, 2018, p. 92.)

science_jobs_original.jpg
  1. List the variables that appear to be displayed in this visualization.

Variables include:

  • Global regions
  • Net impact score of technology on jobs in the next five years (percentage of people who said “Increase” minus percentage of people who said “Decrease”)
  • Positive vs Negative net percentages (color coded as Yellow or Dark Blue)
  1. Now that you’re versed in the grammar of graphics (ggplot), list the aesthetics used and which variables are specified for each.

  2. What type of graph would you call this?

  3. List all of the problems or things you would improve about this graph.

  4. Improve the visualization above by either re-creating it with the issues you identified fixed OR by creating a new visualization that you believe tells the same story better.

Agreement vs Disagreement that Technology Will Increase the Number of Jobs in My Country in the Next Five Years

Third Data Visualization Improvement

For this third plot, you must use one of the other ggplot2 extension packages mentioned this week (e.g., gganimate, plotly, patchwork, cowplot).

  1. Select a data visualization in the report that you think could be improved. Be sure to cite both the page number and figure title. Do your best to tell the story of this graph in words. That is, what is this graph telling you? What do you think the authors meant to convey with it?
Original image (Source: Wellcome Global Monitor, 2018, p. 98.)

religion_science_disagreement.jpg

This plot intends to show how people tend to resolve informational disputes between science and the beliefs of their religion. Respondents had the choice of selecting “science”, “the teachings of my religion”, or “it depends”. Response rates are grouped by answer choice and also by geographical region.

  1. List the variables that appear to be displayed in this visualization.

Variables include:

  • Percentage of people who believe science, their religion, or “it depends” when faced with an informational conflict between the two
  • Global region
  • Region medians
  • Countries
  1. Now that you’re versed in the grammar of graphics (ggplot), list the aesthetics used and which variables are specified for each.

  2. What type of graph would you call this?

  3. List all of the problems or things you would improve about this graph.

  4. Improve the visualization above by either re-creating it with the issues you identified fixed OR by creating a new visualization that you believe tells the same story better.